# **FPGA Implementation Documentation – ALU & Memory**

#### **Abstract**

This work presents the intermediate implementation of two digital design modules - a memory block and an Arithmetic Logic Unit (ALU) on the **Basys 3Artix-7 FPGA** using Vivado.

The memory module was designed to store and retrieve 16-bit data words at specified addresses, with initialization and basic read/write operations verified through simulation as well as implementation on the FPGA.

The ALU module was implemented to perform a range of arithmetic and logical operations on two 16-bit operands, with status flags (zr, ng) displayed via the on-board seven-segment display.

Both modules were successfully synthesized, mapped to LUTs, and verified through testbenches. Preliminary quality of results (QoR) analysis shows efficient utilization of FPGA resources, no congestion, and all timing constraints satisfied, confirming the correctness and feasibility of the approach for further integration and testing.

## **ALU Implementation**

The Arithmetic Logic Unit (ALU) was implemented as a parameterized 16-bit module capable of performing a wide range of arithmetic and logical operations based on six control inputs (zx, nx, zy, ny, f, no). Two 16-bit operands are selected from pre-initialized memory locations using 5-bit address inputs, allowing flexible testing across positive, negative, and patterned data values. Depending on the opcode settings, the ALU supports operations such as zeroing inputs, bitwise negation, addition, subtraction, and logical AND, along with conditional negation of the output.

To provide status feedback, the ALU generates two condition flags: zero (zr), which is asserted when the result equals zero, and negative (ng), which indicates a negative output in two's complement form. These flags are also mapped to the on-board seven-segment display, offering real-time visualization during FPGA testing. The ALU design was simulated extensively using binary test vectors, then synthesized and mapped successfully to the FPGA with correct functional verification.

The ALU was implemented using a **hardcoded register initialization strategy**. Instead of directly providing operands, two memory arrays were defined with 16-bit values preloaded at specific addresses. During execution, the ALU receives only the operand addresses, retrieves the corresponding data, and performs the required computation. This approach streamlined operand handling and enabled testing across a wide set of positive, negative, and patterned values.

# **Block Diagram**



Figure 1: Block diagram of the ALU

The top-level design integrates an ALU with memory arrays and a display unit. Two 5-bit address inputs (mem\_address\_a and mem\_address\_b) select operands from internal memory blocks, which are then fed into the ALU along with control signals. The ALU computes the result and generates status flags (zr, ng), which are passed to the display module for visualization on the seven-segment display (seg, an).

## **IO Table**

| Direction | Name          | Width | Comments                                          |
|-----------|---------------|-------|---------------------------------------------------|
| Input     | clk           | 1     | Clock input to the registers                      |
| Input     | initialise    | 1     | Initializes the registers with the values         |
| Input     | mem_address_a | 5     | Address of the operand 'a'                        |
| Input     | mem_address_b | 5     | Address of the operand 'b'                        |
| Input     | ZX            | 1     | Zeros input x                                     |
| Input     | nx            | 1     | Negates input x                                   |
| Input     | zy            | 1     | Zeros input y                                     |
| Input     | ny            | 1     | Negates input y                                   |
| Input     | f             | 1     | To choose between AND and Add operations          |
| Input     | no            | 1     | Negates the output                                |
| Output    | result        | 16    | Result of the ALU operation                       |
| Output    | seg           | 7     | To interface the Flags onto the 7 Segment display |
| Output    | an            | 4     | To choose display digit                           |

# **Quality of Result**

#### Congestion

The congestion analysis shows no congestion windows above level 5, indicating that the design is well placed and routed without routing bottlenecks. This confirms efficient logic distribution across the FPGA fabric.

```
| Command : report_design_analysis -congestion | Pesign : ALU_FEGA |
| Pesign : ALU_FEGA |
| Pevice : xc7a3tt |
| Design State : Routed |
| Report Design Analysis |
| Table of Contents |
| Placer Final Level Congestion Reporting |
| I. Placer Final Level Congestion Reporting |
| I. Placer Final Level Congestion Reporting |
| Direction | Type | Level | Congestion | Window | Combined LUTs | Avg LUT Input | LUT | LUTRAM | Flop | MUXF | RAMB | DSP | CARRY | SRL | Cell Names |
| No congestion windows are found above level 5 |
| Direction | Type | Level | Percentage Tiles | Window | Combined LUTs | Avg LUT Input | LUT | LUTRAM | Flop | MUXF | RAMB | DSP | CARRY | SRL | Cell Names |
| Direction | Type | Level | Percentage Tiles | Window | Combined LUTs | Avg LUT Input | LUT | LUTRAM | Flop | MUXF | RAMB | DSP | CARRY | SRL | Cell Names |
| No initial estimated Congestion windows are found above level 5
```

Image 1: Congestion Report

#### **Resource Utilization**

4. IO and GT Specific

The utilization summary shows the number of LUTs, flip-flops, IO blocks consumed by the design. The values are well within the available resources of the Basys 3 Artix 7 board, confirming that the design fits comfortably on the FPGA with margin for further expansion.

|            | Used | Fixed | Prohibited | Available | Util% |
|------------|------|-------|------------|-----------|-------|
| Bonded IOB |      | 0     | 0          |           | 43.40 |

Image 2: IO Blocks Utilisation Summary

#### 1. Slice Logic

| Ī | Site Type        |      | Used | Ī | Fixed | ĺ | Prohibited | ĺ | Available | l  | Util% | Ī |
|---|------------------|------|------|---|-------|---|------------|---|-----------|----|-------|---|
|   | Slice LUTs*      |      | 79   |   | _     | Ċ | 0          | Ċ |           |    |       |   |
| i | LUT as Logic     |      | 79   |   | 0     | Ċ | 0          |   | 20800     | ٠. | 0.38  |   |
| 1 | LUT as Memory    |      | 0    | I | 0     | I | 0          | Ī | 9600      | l  | 0.00  | 1 |
| 1 | Slice Registers  |      | 4    | 1 | 0     | I | 0          | Ī | 41600     | l  | <0.01 | 1 |
| I | Register as Flip | Flop | 4    | I | 0     | I | 0          | I | 41600     | l  | <0.01 | T |

Image 3: LUT, Register & Flipflop usage

# 7. Primitives

| +        | +- |      | +- |                     | + |
|----------|----|------|----|---------------------|---|
| Ref Name | Ī  | Used | I  | Functional Category | Ī |
| +        | +- |      | +- |                     | + |
| LUT6     | 1  | 48   | I  | LUT                 | I |
| LUT2     | 1  | 31   | I  | LUT                 | Ī |
| OBUF     | 1  | 29   | I  | IO                  | I |
| IBUF     | I  | 17   | I  | IO                  | I |
| LUT3     | 1  | 16   | I  | LUT                 | 1 |
| LUT4     | 1  | 5    | I  | LUT                 | 1 |
| CARRY4   | 1  | 4    | I  | CarryLogic          | I |
| FDRE     | 1  | 3    | I  | Flop & Latch        | 1 |
| LUT5     | 1  | 2    | I  | LUT                 | 1 |
| LUT1     | 1  | 1    | I  | LUT                 | 1 |
| FDSE     | I  | 1    | I  | Flop & Latch        | Ī |
| BUFG     | 1  | 1    | I  | Clock               | 1 |
| +        | +- |      | +- |                     | + |

Image 4: List of all the primitives used

## **Timing Summary**

The timing report indicates that all paths meet the required constraints, with positive slack values. This means the design achieves timing closure and can operate reliably at the specified clock frequency.

| Me                                            | ssages  | Log                    | Reports   | Design Runs | DRC        | Methodology         | Power      | Timing     | ×                            |          |                                          |          | ? _ | 00 |
|-----------------------------------------------|---------|------------------------|-----------|-------------|------------|---------------------|------------|------------|------------------------------|----------|------------------------------------------|----------|-----|----|
| Q   ★   ♦   C   ₩   ●   Design Timing Summary |         |                        |           |             |            |                     |            |            |                              |          |                                          |          |     |    |
|                                               |         | l Informat<br>Settings | tion      |             | Setup      |                     |            | Но         | ld                           |          | Pulse Width                              |          |     |    |
|                                               | Design  | Timing S               | ummary    |             | Wors       | t Negative Slack (V | VNS): 7.0  | 033 ns     | Worst Hold Slack (WHS):      | 0.229 ns | Worst Pulse Width Slack (WPWS):          | 4.500 ns |     |    |
|                                               | Clock S | iummary (              | (1)       |             | Total      | Negative Slack (TN  | NS): 0.0   | 000 ns     | Total Hold Slack (THS):      | 0.000 ns | Total Pulse Width Negative Slack (TPWS): | 0.000 ns |     |    |
|                                               | Method  | dology Su              | mmary (27 | ")          | Numl       | ber of Failing Endp | oints: 0   |            | Number of Failing Endpoints: | 0        | Number of Failing Endpoints:             | 0        |     |    |
| > (                                           | Check   | Timing (22             | 2)        |             | Total      | Number of Endpo     | ints: 11   |            | Total Number of Endpoints:   | 11       | Total Number of Endpoints:               | 8        |     |    |
| > (                                           | intra-C | lock Paths             |           |             | All user s | specified timing co | onstraints | s are met. |                              |          |                                          |          |     |    |

Image 5: Timing Summary

## **Screenshots**

### Elaborated Circuit Schematic



Image 6: Entire elaborated Circuit

Image 8: View of the register selecting mux, ALU and display\_flag module instantiations and outputs

# Mapping Diagram to LUTs (Synthesized design)



Image 9: Diagram showing the design being mapped to 6-input LUTs (ALU and display\_flags module instantiations)



Image 10: Diagram showing the ALU module instance being mapped onto LUTs and flipflops



Image 11: Diagram showing the display\_flags module instance being mapped onto LUTs and flipflops

#### **Constraints**

XDC file: Xilinx Design Constraint File for ALU Implementation

The design uses a Xilinx Design Constraint (XDC) file to specify pin assignments for the Basys 3 Artix-7 FPGA board. The file maps all the top-level input and output signals to the appropriate physical pins, including the on-board 100 MHz clock source that drives the design, pushbuttons, switches, LEDs, and the seven-segment display. The constraints ensure correct interfacing with the board peripherals. At this stage, **no additional timing, power, or design constraints** have been specified apart from the functional pin mappings.

## **Verification Strategy and Proof of Correctness**

The functionality of the ALU design was verified using a two-fold approach. First, a comprehensive Verilog testbench was developed to simulate various input combinations, including arithmetic, logical, and edge-case operations. The simulation results were compared against the expected outputs (e.g., zero flag, negative flag, and computed results) to confirm correctness. Second, the design was synthesized, implemented, and deployed on the FPGA board, where the outputs were observed directly through the LED indicators and seven-segment display. This combination of simulation and hardware testing ensured that the ALU behaved as intended under both controlled testbench conditions and real hardware execution.





Image 12: Behavioral Simulation of ALU\_FPGA module using a testbench

FPGA Implementation video: ALU Implementation video

#### Design Files:

- 1. Top Module
- 2. ALU module
- 3. Segmented Display Interface Module

## **Memory Implementation**

The memory module was implemented using a **register array** in Verilog, declared as reg [15:0] RAM16K\_SCREEN [0:1023], which provides a storage capacity of 1024 words, each 16 bits wide. The design allows both reading and writing operations.

Values are written into memory by latching an address from the input switches using a pushbutton and then storing the 16-bit input data into the corresponding memory location on the rising edge of the clock when the write control signal is asserted. Reading is achieved by continuously driving the output with the contents of the currently latched address, ensuring that the output always reflects the data stored at the selected memory location.

## **Block Diagram**



Figure 2: Block Diagram of Memory

The memory\_FPGA module implements a 16-bit wide memory with read/write capability. The 16 switches (SW[15:0]) serve as data or address inputs based on control. The BTN\_addr button latches the switch value as the memory address, while BTN\_write writes the switch data into the selected location. The initialise signal can preload or reset memory contents. On each clock (clk), the chosen memory word is read and driven onto the 16-bit out port.

#### IO Table

| Direction | Name             | Width | Comments                                   |  |  |  |  |  |  |
|-----------|------------------|-------|--------------------------------------------|--|--|--|--|--|--|
| Input     | clk              | 1     | Clock input to the registers               |  |  |  |  |  |  |
| Input     | BTN_addr         | 1     | Latches switch input as the memory address |  |  |  |  |  |  |
| Input     | BTN_write        | 1     | Latches switch input as the value input    |  |  |  |  |  |  |
| Input     | put initialize 1 |       | Initializes the registers with the values  |  |  |  |  |  |  |
| Input     | sw               | 16    | Provides data or address via switches      |  |  |  |  |  |  |
| Output    | Output out 16    |       | Outputs data read from memory              |  |  |  |  |  |  |

## **Quality of Result**

#### Congestion

The congestion analysis for the memory module shows no significant routing congestion within the FPGA fabric. Since the design primarily consists of a memory array with simple control logic (address latch, write enable, and output register), the resource utilization is low, and routing demand is minimal.



Image 13: Congestion report of Memory module

#### **Resource Utilisation**

The memory design shows moderate utilization of LUTs and registers, with about one-quarter of LUTs and less than half of the available registers in use. The absence of LUTs configured as memory indicates that the design relies entirely on flip-flops for storage rather than distributed RAM. The use of F7 and F8 multiplexers suggests combinational logic is being optimized for wider functions. Overall, the design is resource-efficient, fits comfortably within the FPGA capacity, and leaves enough margin for additional modules or expansion.

|                                  |        |              |    |      |     |           |     |                 |    |              |     | . Primiti |     |       |                        |   |
|----------------------------------|--------|--------------|----|------|-----|-----------|-----|-----------------|----|--------------|-----|-----------|-----|-------|------------------------|---|
| Site Type                        | i      | Used         | F  | ixed | Pro | ohibited  | Ava | ilable          | Ut | i1%          |     |           |     |       | +                      |   |
| Slice LUTs*<br>LUT as Logic      |        | 5514<br>5514 | Ī  | 0    |     | 0 I       |     | 20800           | 26 | .51  <br>.51 |     | Ref Name  | i   | Used  | Functional Categor<br> | - |
| LUT as Memory<br>Slice Registers | I<br>I | 0<br>16376   |    | 0    |     | 0 1       |     | 9600  <br>41600 |    |              | - 1 | FDRE      | 1   | 16376 | Flop & Late            | h |
| Register as Flip Flop            | l      | 16376        | 1  | 0    | ı   | 0         |     | 41600           | 39 | .37          | - 1 | LUT6      | -   | 4420  | LU                     | T |
| Register as Latch                | 1      | 0            |    | 0    |     | 0 [       |     | 41600           |    | .00          | - 1 | MUXF7     | -1  | 2164  | MuxF                   | × |
| F7 Muxes<br>F8 Muxes             | 1      | 2164<br>1058 |    | 0    |     | 0 1       |     | 16300  <br>8150 |    |              | - 1 | MUXF8     | -   | 1058  | MuxF                   | × |
|                                  |        |              |    | -    |     |           |     |                 |    |              | - 1 | LUT4      | - 1 | 543   | LU                     | Т |
| IO and GT Specific               |        |              |    |      |     |           |     |                 |    |              | - 1 | LUT5      | 1   | 428   | LU                     | T |
|                                  |        |              |    |      |     |           |     |                 |    |              | - 1 | LUT2      | 1   | 140   | LU                     | T |
|                                  |        |              |    |      |     |           |     |                 |    |              | 1   | LUT3      | 1   | 46    | LU                     | т |
| Site Type                        |        |              |    |      |     | Prohibite |     |                 |    |              | 1   | IBUF      | ı   | 19    | I                      | 0 |
| orce Type                        |        |              |    | •    |     |           |     |                 |    |              | 1   | OBUF      | 1   | 16    | I                      | 0 |
| Bonded IOB                       |        | 1            | 35 | ī    | 0   |           | 0   | 1               | 06 | 33.02        | i   | BUFG      | i   | 1     | Cloc                   | k |

Image 14: Resource utilization for Memory module

# **Timing Summary**

The Design Timing Summary shows that all user-specified timing constraints are met. The setup slack (0.945 ns), hold slack (0.388 ns), and pulse width slack (4.500 ns) are positive, indicating there are no timing violations and the design is stable for implementation.



Image 15: Design Timing Summary for Memory module

#### **Screenshots**

Mapping Diagram to LUTs (Synthesized design)



Image 16: Part of the LUT mapping diagram of the Memory module



Image 17: LUT mapping diagram of the Memory module

## **Elaborated Circuit Schematic**



Image 18: Entire Elaborated circuit for Memory Module

#### **Constraints**

### XDC File Xilinx Design Constraint File from Memory Implementation

The constraint file for the memory module specifies the mapping of FPGA pins to the inputs and outputs of the design. The on-board clock is used as the primary timing source to drive the memory operations. Switches are mapped to provide address and data inputs, while pushbuttons are assigned for latching the address and triggering write operations. The output of the memory is mapped to the FPGA LEDs for verification. At this stage, **no additional power or timing constraints** have been applied, and only essential pin assignments required for correct operation on the FPGA board are included.

# **Verification Strategy and Proof of Correctness**

The memory design was verified using a two-step approach. First, a simulation testbench was developed to apply various binary inputs, check address latching, and confirm correct read/write functionality across multiple test cases. Second, the design was implemented on the FPGA, where switches and pushbuttons were used to provide inputs, and the stored values were observed on the output LEDs. This ensured functional correctness in both simulation and hardware.

Testbench file: <u>Testbench File for Memory Implementation</u>



Image 20: Behavioral Simulation Result of Memory Implementation

FPGA Implementation video: Memory Implementation Video

Design File: Memory